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ABSTRACT 

Novelty search has shown to be a promising approach for the 
evolution of controllers for swarm robotics. In existing stud- 
ies, however, the experimenter had to craft a domain depen- 
dent behaviour similarity measure to use novelty search in 
swarm robotics applications. The reliance on hand-crafted 
similarity measures places an additional burden to the exper- 
imenter and introduces a bias in the evolutionary process. In 
this paper, we propose and compare two task-independent, 
generic behaviour similarity measures: combined state count 
and sampled average state. The proposed measures use the 
values of sensors and effectors recorded for each individual 
robot of the swarm. The characterisation of the group-level 
behaviour is then obtained by combining the sensor-effector 
values from all the robots. We evaluate the proposed mea- 
sures in an aggregation task and in a resource sharing task. 
We show that the generic measures match the performance 
of domain dependent measures in terms of solution quality. 
Our results indicate that the proposed generic measures op- 
erate as effective behaviour similarity measures, and that it 
is possible to leverage the benefits of novelty search without 
having to craft domain specific similarity measures. 

Categories and Subject Descriptors 

1.2 [Artificial Intelligence]: Robotics; 1.2 [Artificial In- 
telligence]: Distributed Artificial Intelligence 

General Terms 

Algorithms 

Keywords 

Evolutionary swarm robotics, novelty search, behaviour 
characterisation 



1. INTRODUCTION 

Swarm robotics is a promising approach to collective 
robotics, where the group level behaviour emerges from the 
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local interactions among agents, and from the interactions 
between the agents and the environment m . This approach 
has the potential to incite several desirable properties in a 
group of agents, such as robustness, flexibility, and scalabil- 
ity [4]. However, the complexity stemming from the intricate 
dynamics required to produce self-organised behaviour com- 
plicates the hand-design of control systenrs 
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Artificial 

evolution has been shown capable of exploiting the intricate 
dynamics and synthesise self-organised behaviours (see for 
example [20| |21| [2|[3|), but the approach carries several is- 
sues. The most prominent issue associated with common 
evolutionary techniques is deception [22]. Deception occurs 
when the fitness function misguides the search towards lo- 
cal maxima that do not contain adequate solutions to the 
problem. As the complexity of a problem increases, the 
fitness landscape typically becomes more rugged and gains 
more local maxima 18 . As such, it becomes more diffi- 
cult to craft a fitness function that can successfully guide 

the evolutionary 
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i.e. 



the search towards the objective 

process becomes more vulnerable to deception. 

Novelty search FTI] is a distinctive evolutionary approach 
where candidate solutions are rewarded based solely on their 
behavioural novelty, with respect to previously evaluated 
solutions. In recent work [tI [8] , it was shown that novelty 
search can avoid deception in the evolution of swarm robotic 
systems. Besides not being affected by deception, it was also 
shown that novelty search is able to find a greater diversity of 
solutions, and the successful solutions were simpler in terms 
of neural network complexity, when compared to those found 
by fitness-based evolution. But these advantages come at 
a price: for novelty search to work, it is necessary to craft 
the domain dependent behaviour similarity measure, used to 
compute the novelty of the individuals. The results showed 
that the choice of the novelty metric has a significant impact 
in the performance of novelty search, and can introduce a 
significant bias in the evolutionary process. 

Previous works have proposed behaviour similarity mea- 
sures that are domain independent [9] [6] FT?] . These generic 
measures can potentially be used to overcome the afore- 
mentioned limitation of novelty search. Generic measures 
are typically based on the sensor and effector values of the 
agents exclusively, and do not rely on domain knowledge pro- 
vided by the experimenter. However, the generic measures 
described in previous works are aimed at single robotic sys- 
tems. In this paper, we study how generic measures can be 
adapted to swarm robotic systems. 

This paper proposes generic behaviour similarity measures 
that use the sensor and effector values of the robots of the 



swarm to obtain a representation of the typical behaviour 
of the swarm as whole. The measures are evaluated in two 
swarm robotics tasks: (i) an aggregation task; and (ii) a task 
where robots must share an energy recharging station in or- 
der to survive. Following previous results M , novelty search 
is used in combination with the fitness function, through a 
linear scalarization. of novelty and fitness objectives. NEAT 
is used as the underlying neuroevolution method. 

The results of our experiments suggest that novelty search 
with the proposed generic similarity measures can match the 
performance of novelty search with domain-dependent sim- 
ilarity measures, regarding the quality of the evolved solu- 
tions. We show that the documented advantages of novelty 
search, such as its capacity to bootstrap evolution and to 
circumvent deception [7], are also present with the use of 
generic measures. 

2. RELATED WORK 

In this section, we describe the novelty search algorithm, 
and how novelty search can be combined with fitness-based 
evolution to improve the effectiveness of the evolutionary 
process. We move on to discuss the previous applicatons of 
novelty search in evolutionary robotics. We conclude with 
a discussion of the generic behaviour similarity measures 
proposed in previous works. 

2.1 Novelty Search 

Novelty search [14] can be implemented over any evolu- 
tionary algorithm. The distinctive aspect of novelty search 
is how the individuals of the population are scored. Instead 
of being scored according to how well they perform a given 
task - which is typically measured by a fitness function, the 
individuals are scored based on their behavioural novelty - 
which is given by the novelty metric. This metric quanti- 
fies how different an individual is from the other, previously 
evaluated individuals with respect to behaviour. 

To measure how far an individual is from others individu- 
als in behaviour space, the novelty metric relies on the aver- 
age distance of that individual to the fe-nearest neighbours, 
among the current population and a sample of the previ- 
ously seen behaviours (stored in an archive). The behaviour 
distance between each two individuals is given by a function 
dist that should be provided by the experimenter. Candi- 
dates from sparse regions of the behaviour space thus tend 
to receive higher novelty scores, thereby creating a constant 
evolutionary pressure towards behavioural innovation. 

The function dist is typically defined with domain knowl- 
edge. Following this approach, the behaviour of each indi- 
vidual is characterised by a vector of real numbers. The 
experimenter should design the behaviour characterisation 
vector so that it captures behaviour features that are con- 
sidered relevant to the problem or task. The behaviour dis- 
tance between two individuals is then given as the Euclidian 
distance between the corresponding behaviour characterisa- 
tion vectors of the individuals. A distinct approach is to use 
distance functions that do not rely on domain knowledge. 
This approach is the main focus of this paper and will be 
detailed in Section [2J2J 



2.1.1 Combining Novelty and Fitness 

As novelty search is guided by behavioural innovation 
alone, its performance can be greatly affected by the size 
and shape of the behaviour space. In particular, behaviour 



spaces that are vast or contain dimensions not related with 
the task can negatively impact the performance of novelty 
search [13| p>] , because novelty search may spend most of its 
time exploring behaviours that are irrelevant for the goal 
task. To address this issue, several authors have proposed 
techniques that combine novelty with fitness in the evalua- 
tion of the individuals [l3| [5J [8J [Ts] [l7| . 

In our experiments, we use a linear scalarization of the 
novelty and fitness objectives \5\. We chose this approach 
because it can be used together with NEAT without any fur- 
ther modifications, and has shown promising results in pre- 
vious studies [8]. Linear scalarization of the novelty and fit- 
ness objectives directs the search towards regions with high 
fitness in the behaviour space. An individual i is evaluated 
to measure both fitness, fit(i), and novelty, nov(i), which 
after being normalised (Eq. [T]) are combined according to 
Eq.[2] 



fit® 






,.-, nOv(i) — rlOVmin /1N 

nov(i) — — (I) 



TlOVmax ilOVrnin 



t(i) — (1 — p) ■ fit(i) + p ■ nov(i) 



(2) 



The parameter p controls the relative weight of novelty, 
and must be specified by the experimenter. fit m i n and 
noVmi n are the lowest fitness and lowest novelty score in the 
current population, and fit„ lax and nov max are the highest 
fitness and highest novelty score, respectively. 

2.7.2 Novelty Search in Evolutionary Robotics 

Novelty search, and other evolutionary techniques based 
on behavioural diversity, have been applied with success to 
single robotic systems. Some of these applications include 
body-brain co-evolution ITTJ ; biped robot control |14| ; robot 
navigation in deceptive mazes [14]; sequential light seek 
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mg [iYJ; and a robot ball-collecting task [171. In 
is presented a comprehensive study of the use of diversity- 
based techniques in evolutionary robotics. 

Gomes et al. [7j[8] showed that novelty search can also pro- 
vide a valuable contribution to the evolution of controllers 
for swarm robotics. In particular, the results showed that 
the use of novelty search circumvented deception and boot- 
strapping problems, and could unveil a broad diversity of 
solutions for the same problem. However, the same studies 
revealed that defining behaviour characterisations for this 
domain can be a delicate endeavour. Since there are in- 
finitely many behavioural possibilities, many of these pos- 
sibilities must be conflated in order to construct a viable 
search space. Excessive conflation, however, can hinder the 
evolution of certain types of solutions, and degrade the per- 
formance of novelty search. Furthermore, the definition of 
the behaviour characterisation adds a human bias to the 
process, which is an aspect that should be minimised in evo- 
lutionary robotics [18| . 

2.2 Generic Novelty Measures 

Gomez [9] proposes the use of generic measures for as- 
sessing the behaviour similarity between individuals. The 
proposed approach consists of building state-action trajec- 
tories for the agent, i.e., the history of actions of the agent 
through time. These trajectories are then compared, ob- 
taining a measure of behaviour similarity, without the need 
of providing domain-specific knowledge. To compare the se- 
quences of actions, the author evaluates the use of Hamming 



distance, relative entropy, and normalised compression dis- 
tance (NCD). The experimental setup is based on the Tar- 
tarus problem, and the results show that the NCD distance 
offers the best performance, followed closely by the Ham- 
ming distance. NCD is a similarity measure that exploits 
the algorithmic regularities of the sequences, but introduces 
a significant computational overhead. 

To address the difficulty in designing behaviour character- 
isations for evolutionary robotics, Doncieux and Mouret p\ 
proposed and compared generic behaviour similarity mea- 
sures for evolutionary robotics. Any evolutionary robotics 
experiment involves robots with actuators and sensors, 
whose values reflect the microscopic behaviour of the robot. 
This notion led to the definition of the following generic 
measures [6]: 

Hamming distance A vector is built with the sensor and 
effector values of the robot, sampled throughout the sim- 
ulation: 



0=[{a(t),e(t)},te[O,T\] , 



(3) 



where s(t) and e(t) are the vectors of the sensor and effec- 
tor values at time t, respectively, and T is the simulation 
time. The vector •d is then binarised into $;,;„, by trans- 
forming each value in either or 1. The similarity mea- 
sure is then given as the Hamming distance between the 
corresponding i?j, in vectors obtained for each individual. 

Direct Fourier Transform The $ vectors are obtained 
for each individual, similar to the Hamming distance mea- 
sure. But instead of using the complete vectors, a Discrete 
Fourier Transform (DFT) is used to reduce the dimension- 
ality. The similarity measure is defined as the Euclidean 
distance between the first uf coefficients of the DFT. 

Systematic State Count Perception-action states are de- 
fined based on the possible combinations of #i,j n . Rely- 
ing on the sensor-effector data, the number of times the 
robot was in a particular state is then evaluated, result- 
ing in a vector of n integers, n being the number of such 
states. The similarity measure is then defined as the mean 
element-wise distance between the vectors. 

These methods were evaluated in a ball collection task, 
where the robot had 9 sensors and 3 effectors. The nov- 
elty metric was combined with the fitness function through 
multi-objectivisation. The results showed that the Ham- 
ming distance measure was the most effective, being superior 
to the similarity measure defined with domain knowledge. 
The systematic state count and DFT measures displayed a 
significantly lower performance when compared to the Ham- 
ming distance. 

The Hamming distance similarity measure was further 
tested in [17] . In these experiments, the measure was eval- 
uated in three different tasks (deceptive maze, sequential 
light seeking, ball-collecting robot), and different diversity 
maintenance techniques. When using multi-objectivisation 
of novelty and fitness, the results showed that the generic 
Hamming distance was at least as good as the similarity 
measure manually defined with domain knowledge, regard- 
ing the quality of the evolved solutions. 



3. METHODS 

3.1 Combined State Count 

The proposed Combined State Count is an adaptation of 
Systematic State Count (see Section [2,2| |. Despite the lower 
performance in the experiments of Doncieux & Mouret p\ , 
when compared to the other generic measures, the concept 
of this method can be directly adapted to swarms of robots. 
As such, it is the starting point of our study. The principle 
is to define states based on the values from the sensors and 
effectors recorded for each robot. Then, the number of times 
the robots of the swarm were in each state is computed. 
There is no discrimination in terms of which robot was in a 
particular state, i.e, the state counting at the swarm level is 
the sum of the state counts for each robot in the swarm. 

The state counting approach is, however, prone to suffer 
from scalability issues, since the number of states grows ex- 
ponentially with the number of sensors/effectors, and with 
the number of possible values for each sensor/effector. To 
address this issue, we propose modifications over the origi- 
nal State count. Scalability is achieved through the use of 
efficient structures for representing states and characterisa- 
tions, and mechanisms for reducing the effective number of 
states. 

Efficient State Count Representation 

Representing each behaviour characterisation as a vector 
with one position for each state (as proposed in [6]) can 
compromise the efficiency of the algorithm if there is a large 
number of states. However, the number of visited states in 
one simulation is only a small fraction of the total number 
of possible states. As such, we can represent each charac- 
terisation as a map from states to counts. The counting 
is normalised according to the size of the swarm, to allow 
fair comparisons between simulations with different swarm 
sizes. The behaviour similarity measure is then given by the 
difference between the state count maps. To calculate the 
characterisation map ml for each individual, Algorithm n] is 
used: 

Algorithm 1 State count characterisation 

m <— Map < Int, Float > 

for all simulation-steps do 

for all r in robots do 

# r <— read-state(r) 

■d' r <— discretise(i9 r ) 

h «— hash(#£.) 

if m does not contain h then 

m[h] «- 
end if 

m[h] <s— m[h] + 1 / ' swarmsize 
end for 
end for 
rn 4— filter [rn) 
return rn 

The function read-state retrieves the current sensor- 
effector state i?(r) for a particular robot r: 

tf(r) = {s(r),e(r)} , (4) 

where s(r) is the vector of size n 3 , composed of the values 
coming from the n 3 sensors of the robot r; and e(r) is the 
vector composed of the effector values. 



The discretised vector $'(r) is obtained by independently 
normalising each element of #(r) to the interval [0,K — 1], 
followed by an approximation to the nearest integer: 



*i(r) 
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(5) 



where $i,max and #i jm »n are respectively the maximum and 
minimum values of the i-th sensor/effector, and K is the 
number of target partitions. The parameter K has direct 
implications in the number of possible states, and it should 
be empirically determined. A rule of thumb is to define it 
accordingly to the length of $. For most applications, K 
values of 2 and 3 are adequate, categorising the value of 
each sensor in High/Low, or High/Medium/Low. However, 
if the robots have a small number of sensors ($ is relatively 
short), higher values of K might be preferred, in order to 
operate with more detailed behaviour characterisations. 

The function hash was implemented with the Jenkins' one- 
at-a-time hashrj The intent of hashing the vector $'(r) is 
twofold. First, it allows lookups of the corresponding entry 
in the m map in 0(n) time, n being the length of i?'(r). 
Second, as different vectors are hashed to different values 
(there is a very low probability of collisions), there is no need 
to store i?' vectors, which improves the space complexity of 
the algorithm. 

Reducing the Number of States 

The function filter eliminates the least observed states, in 
order to improve the efficiency of the algorithm. Preliminary 
results revealed that robots tend to spend most of their time 
in a small subset of the state space. Most of the states are 
visited only in one or a few simulation steps. As such, elim- 
inating these states from the behaviour characterisation can 
significantly improve the efficiency of the algorithm, practi- 
cally without compromising the accuracy of the characteri- 
sation. The function filter removes from the characterisation 
the states where the robots spent less than T% of the time: 



' = \(h,c)e 
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E< 



(6) 



The constant T should be empirically determined. In our 
experiments, a value of only 1% was enough to drastically 
reduce the number of states in each characterisation. For in- 
stance, the preliminary results showed that on average 99% 
of the simulation time was spent on only 10% of the visited 
states. 

Distance Between Characterisations 

To calculate the distance between two characterisations we 
chose to use the Bray-Curtis dissimilarity, a well-known mea- 
surement for quantifying the difference between samples of 
abundance data. Bray-Curtis is a modified Manhattan mea- 
sure, where the summed differences between the variables 
are standardised by the summed variables of the samples. 
This measure is within the range of to 1. A value of 
indicates that the two samples have the same composition, 
while a value of 1 means the two samples do not share any 
element. 

Adapting the Bray-Curtis dissimilarity behaviour charac- 
terisations, the difference b between two characterisations 



roi and rri2 is given by: 
b(m\, ma) = 

y^ \mi[i] - ?7i2 [i] | + ^2 m iW+ E m2 W 

E m iW+ E m2 W 

i £mj i £Em2 

3.2 Sampled Average State 

The second similarity measure relies on the principles of 
the Hamming Distance measure (see Section [2.2[ |, which was 
one of the most successful generic similarity measures in pre- 
vious works with single robotic systems [6) Il7|. However, 
this measure relies on the full description of the sensor- 
effector states of the robot through time. As such, it can 
not be directly used with swarms of robots because (i) it 
would not scale with the number of robots, and (ii) the be- 
haviour of an individual robot in a swarm often has a sig- 
nificant stochastic component. To overcome these issues, we 
propose the following modifications: 

• The state of the swarm at a given instant is the average 
of the sensor-effector states of each robot. This allows 
scalability in respect to the size of the swarm. 

• The state of the swarm is averaged over a certain time 
window. This reduces the sensitivity to the initial con- 
ditions, and to the stochastic nature of the individual 
robots behaviour. 

The characterisation of an individual is given by: 

= [{W(«>), • • ■ ,vz(v>)},we[o,W[] , (8) 

where W is the number of time windows and W(p) is the 
average value of the i-th sensor/effector over the w-ih time 
window: 
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(w+l)T/P R-l 

E I E ^At) , (9) 

t=wT/P i-=0 

T is the total simulation time, R the number of robots, and 
«i, r (£) is the normalised value of the i-th sensor/effector of 
the robot r, at instant t: 



v'i,r(t) 



(10) 



1 http : //www . burtleburtle . net /bob/hash/ doobs . html 



Vi,max and v i>rnin are the maximum and minimum values of 
the i-th sensor/effector, respectively. 

The distance between two characterisations i?i and i?2 is 
then given by the Manhattan distance between the vectors: 

d m an(tfl,tf 2 )=]T>iM-tf2H| • (11) 

4. EXPERIMENTAL SETUP 

The proposed generic similarity measures are evaluated 
over two swarm robotic tasks: aggregation and resource 
sharing. The generic measures are compared with domain 
dependent measures, and with fitness-based evolution. 

Our experimental framework is based on Simbad 3d Robot 
Simulator [To] for the robotic simulations. In both tasks, the 
environment is a 3 m by 3 m square arena bounded by walls. 
The swarms are homogeneous. Each robot is modelled after 
the e-puck, but with modifications to the sensor setup. Each 



robot is circular with a diameter of 8 cm, and is equipped 
with differential drive, capable of delivering speeds of up to 
12cm/s. The local on-board controllers are recurrent neu- 
ral networks. The inputs of the neural networks are the 
normalised values of the sensors of the robot, and there are 
three outputs: one to control each of the two motors, and 
one dedicated to completely halt the movement of the robot. 
Each simulation lasts for 2500 simulation steps, which cor- 
responds to 250 s of simulated time. 

4.1 Aggregation Task 

Aggregation is a commonly studied task in swarm 
robotics [20| [2] . In this task, a dispersed robot swarm must 
form a single cluster in any point of the arena. The swarm 
has a fixed size of 7 robots. Each robot is equipped with 
(i) 8 IR sensors evenly distributed around its chassis for the 
detection of obstacles (walls or other robots) within a range 
of 10 cm; (ii) 8 IR sensors dedicated to the detection of other 
robots within a range of 25 cm; and (iii) a sensor that re- 
turns the percentage of nearby robots (within a radius of 
25 cm), relative to the swarm size. 

The fitness function F a is defined as the average distance 
of the robots to the centre of mass of the swarm, measured 
at the last instant of the simulation: 



F a 
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dist(R T ,r iT ) 
N 



(12) 



where Rt is the centre of mass in the last instant of simula- 
tion, and Yi T is the position of robot i. The distance values 
are normalised to [0, 1]. 

The domain dependent behaviour characterisation, used 
as benchmark, is based on the average distance to the centre 
of mass of the swarm, and the number of clusters, sampled 
through the simulation time uj. Considering a simulation 
with N robots and T temporal samples, the characterisation 
b a is given by: 



b a = {cm, cl} 

N 



cm 



N 
1 
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(13) 



cl = -j-r [cluster Count{l) , ■ • ■ , cluster Count(T)\ 



where Rt is the centre of mass at time t, and r<, is the po- 
sition of robot i at time t. The function d gives the distance 
normalised in the range [0,1]. The function clusterCount 
returns the number of robot clusters. Two robots belong to 
the same cluster if the distance between them is less than 
the robot IR sensor range (25 cm). 

4.2 Resource Sharing Task 

In this task, the swarm must coordinate in order to allow 
each member periodical access to a single battery charging 
station. The robots should first find the charging station, 
and then effectively share the station to ensure the survival 
of all the robots in the swarm. The charging station can 
only hold one robot at the time. 

Our experiments use a group of 3 robots. Each robot has 
(i) 8 IR sensors for the detection of obstacles up to a range 
of 10 cm; (ii) 8 sensors dedicated to the detection of other 
robots up to a range of 25 cm; (iii) 8 sensors for the detection 
of the charging station up to a range of 1 m; (iv) a binary 



sensor that indicates if the robot is over the charging station; 
and (v) a proprioceptive sensor that reads the current energy 
level of the robot. 

Each robot starts with full energy (1000 units), and spends 
energy at a rate proportional to motor usage: a robot spends 
5 units per second when motors are off, and 10 units of en- 
ergy per second when motors propel the robot at its maxi- 
mum speed. The charging station is placed in the centre of 
the arena, and charges a robot at a rate of 100 units of en- 
ergy per second. The robots have to be completely stopped 
in order to charge. 

The fitness function F s used to evaluate the controllers is 
a linear combination of the number of robots alive at the 
end of the simulation and the average energy of the robots 
throughout the entire simulation: 



F s = 0.9 
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(14) 



where |or| is the number of robots still alive in the end of 
the simulation, T is the length of the simulation, N is the 
number of robots in the swarm, ei t is the energy of the robot 
i at time t, and e max is the maximum energy of a robot. The 
second term of F s concerning the average energy is included 
to differentiate solutions where the same number of robots 
survive. 

The domain dependent behaviour characterisation is an 
extension of the characterisation used in previous experi- 
ments with this task [8[. The characterisation is a vector of 
length four, composed by the following behavioural features 
that are related to the task: (i) The number of robots that 
reached the end of the simulation alive; (ii) the average en- 
ergy of the alive robots throughout the simulation; (iii) the 
average movement of all alive robots; and (iv) the average 
distance of all alive robots to the charging station. Each of 
these elements is normalised to [0,1]. 

4.3 Configuration of the Algorithms 

NEAT [19] is used as the underlying neuroevolution algo- 
rithm. NEAT is widely used, and one of the most successful 
neuroevolution approaches developed to date. We use the 
implementation provided in NEAT4Jr]The parameters for 
NEAT were the same in all experiments: recurrent links are 
allowed, crossover rate - 25%, mutation rate - 10%, popu- 
lation size - 200. The remaining parameters were assigned 
their default value in the NEAT4J implementation. 

The implementation of novelty search follows the descrip- 
tion in [14]. We used a k value of 15 nearest neighbours, 
and the individuals are added to the novelty archive with a 
probability of 2% [12]. The size of the archive is bounded 
to 500 individuals. When the archive is full, individuals are 
randomly removed as needed. 

Novelty search is combined with the fitness-function 
through a linear scalarization. of the novelty and fitness 
objectives (see Section [2. 1.1[ ). In all novelty search experi- 
ments, the value of p was set to 0.7, which means that the 
score of each individual is based on 70% of the novelty score 
and 30% of the fitness score. This value was empirically 
chosen, and in agreement with previous experiments [8]. 

For the combined state count measure, the filter threshold 
T was set to 1% in all experiments, and the discretisation 
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level K was set to 3. For the sampled average state mea- 
sure, three values of W were tested: 1, 10, and 50, which 
correspond to time windows of 250 s, 25 s, and 5 s, respec- 
tively. In both generic similarity measures, the values com- 
ing from the sensor arrays (composed of 8 sensors for the 
detection of obstacles, other robots, or the charging station) 
were compressed in four values. These four values represent 
the closest distance measured at the front of the robot, left, 
right, and back. This compression was done to reduce the 
number of states (in the combined state count measure), and 
to reduce the length of the characterisation (in the sampled 
average state measure). 

Each controller was evaluated in 10 simulations, randomly 
varying the initial positions and orientations of the robots. 
The fitness scores obtained in each simulation are combined 
to a single value using the harmonic mean as advocated 
in [j] . The behaviour characterisations obtained in the mul- 
tiple simulations are also merged in a single one through 
an element-wise average (in the domain dependent mea- 
sures and sampled average state), and by summing the state 
counts (in combined state count). The best individuals found 
in each generation were post-evaluated with 50 simulations, 
in order to attain more reliable statistics. 

5. RESULTS 

The following treatments were applied to each task. Each 
evolutionary method was evaluated in 10 independent evo- 
lutionary runs. The parameters of each method were set as 
specified in Section [4. 3| 

SC 

AS-1 

AS-10 

AS-50 

DD 

Fit 



Aggregation 



1- T- LO 



Combined state count 

Sampled average state with W — 1 

Sampled average state with W — 10 

Sampled average state with W — 50 

Novelty with domain dependent similarity measure 

Fitness-based evolution 



The quality of the solutions evolved with each evolution- 
ary method is depicted in Figure [l] The boxplots represent 
the highest fitness score found until a given generation, in 
each evolutionary run of each treatment. The depicted re- 
sults are further explained next. 

5.1 Aggregation 

As the results show (Figure [l]- Aggregation), the fitness 
function is not deceptive, as fitness-based evolution can al- 
most always reach good quality solutions. The most noto- 
rious advantage of novelty search is its capacity of avoiding 
deception. However, previous work U\ has shown that even 
in non-deceptive swarm robotic tasks, novelty search can 
offer a number of advantages. As such, it is still valuable 
to analyse the performance of novelty search with generic 
behaviour similarity measures in this non-deceptive task. 

In early stages of evolution (at generation 20), novelty 
search has an advantage over fitness-based evolution, con- 
firming that novelty search quickly bootstraps the evolu- 
tionary process ff\ [16] . All similarity measures, except for 
state count were superior to fitness-based evolution (p- value 
< 0.05, Mann- Whitney U test). 

Around the middle of the evolution (generation 75), the 
differences between the multiple treatments are less pro- 
nounced. By the end of the evolution, the domain dependent 
similarity measure is only superior to the state count mea- 
sure (p- value < 0.05). This absence of significant difference 
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Figure 1: Performance comparison of the evolution- 
ary treatments in both tasks, regarding the high- 
est fitness score achieved at different stages of the 
evolutionary process. The boxplots represent the 
distribution of the fitness scores obtained in the 10 
evolutionary runs of each treatment. 



between treatments is actually a promising result. Previ- 
ous work FF] has shown that when the behaviour similarity 
measure is poorly defined, the performance of novelty search 
tends to degrade significantly, regarding the quality of the 
solutions evolved. In our experiment, the generic measures 
yielded results similar to fitness-based evolution and to the 
domain dependent measure, which suggests that the generic 
measures are indeed acting as effective behaviour similarity 
measures. 

5.2 Resource Sharing 

As previous experiments have shown [8] , the resource shar- 
ing task is inherently deceptive. In particular, fitness-based 
evolution tends to get stuck in two local maxima: (i) The 



robots do not move at all in order to conserve energy and 
survive longer, and as a consequence, they can not find the 
charging station and all the robots run out of energy (fitness 
score around 0.04); and (ii) when a robot finds the charg- 
ing station, it occupies it and never leaves, condemning the 
other robots (fitness score around 0.38). The deceptiveness 
of this task makes it especially suitable to solve using novelty 
search. As such, this task is a good benchmark to evaluate 
if the behaviour similarity measures are capable of avoiding 
deception and guiding evolution towards good solutions. 

At the early stages of evolution (generation 50, see Fig- 
ure n] - Resource sharing) , almost all runs of fitness-based 
evolution are still stuck in the local maximum where the 
robots do not move. On the other hand, all treatments 
based on behaviour novelty could successfully bootstrap the 
evolution. At this early stage, there are still no significant 
differences between the novelty based treatments. By the 
middle of the evolutionary process, the domain dependent 
similarity measure stands out, being superior to all the treat- 
ments (p- value < 0.05, Mann- Whitney U test), except for 
AS-50. There are no statistically significant differences be- 
tween generic similarity measures at this stage. 

Looking at the best fitness scores achieved in the whole 
evolution (generation 250), the superiority of the domain 
dependent similarity measure holds. However, all novelty 
based treatments were superior to fitness based-evolution (p- 
value < 0.05), and more or less consistently, all reached high 
fitness scores. Regarding the generic similarity measures, 
the AS-50 treatment stands out, being significantly superior 
to SC and AS-1 (p- value < 0.05). 

5.3 Combined State Count 

In both tasks, the combined state count measure was the 
least effective generic measure. Nevertheless, the perfor- 
mance was close to the sampled average state, which con- 
trasts with the results in Rjj. To shed some light on the 
inferior performance of combined state count, we analysed 
the sensor-effector states that are visited with each individ- 
ual (Figure |2b. 

The increasing average number of states depicts the in- 
creasing complexity of the solutions, throughout the evolu- 
tion. However, the average number of common states do not 
follow this trend. Since the distance between two state count 
characterisations is essentially determined by the states they 
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Figure 2: Average number of sensor-effector states 
visited by each population individual (after the Al- 
tering step), compared with the average number of 
states that each individual shares with the current 
population and the novelty archive. 



share, this distance can lose accuracy if the characterisations 
share few states. In the extreme case, if no states are shared, 
the distance value is always the same. 

To overcome this issue, we suggest that the similarity be- 
tween states should also be considered in the distance metric, 
besides the count of each state. This way, the distance will 
maintain its accuracy, regardless of the number of shared 
states. Further studies are required in order to assess the 
viability of this approach. 

5.4 Sampled Average State 

Regarding the sampled average state technique, the most 
inrportant factor to study is the influence of the parameter 
W. This parameter controls the length of the characterisa- 
tion and how accurately it captures the temporal component 
of the robots behaviour. In the aggregation task, there was 
no significant difference among the treatments with different 
W values (p- value < 0.05). On the other hand, in the re- 
source sharing task there is a trend in the results: the higher 
the W, the better the performance of the evolutionary pro- 
cess, regarding the quality of the solutions. The treatment 
with W = 50 delivers significantly higher fitness scores than 
the treatment with W = 1 (p- value < 0.05). 

The reason for the different impact of the W value in dif- 
ferent tasks is still not clear. Our hypothesis is that the 
difference is due to the degree of behaviour regularity neces- 
sary to solve each task. The aggregation task can be solved 
using a regular pattern of behaviour, almost a reactive ap- 
proach. As such, a low W value might be sufficient to ad- 
equately characterise the behaviour of the swarm. On the 
other hand, the resource sharing task requires a more se- 
quential behaviour, which involves first finding the charging 
station, and then a different behaviour for sharing it with 
the other robots. As of consequence, higher W values might 
be preferred, as they allow the sequential component of the 
behaviour to be adequately captured. Further experiments, 
with different tasks, are required to confirm or reject this 
hypothesis. 

6. CONCLUSION 

We proposed two generic similarity measures for the do- 
main of evolutionary swarm robotics, and used them to drive 
novelty search. The proposed measures rely on the principle 
that by analysing the microscopic behaviour of the robots 
of the swarm, it is possible to obtain a characterisation of 
the swarm behaviour as whole. The microscopic behaviour 
of each robot is exclusively based on the sensor and effector 
values of the robots, keeping the characterisation completely 
independent from the experimenter's domain knowledge. 

The proposed similarity measures were tested in two dis- 
tinct tasks, and compared with carefully crafted domain 
dependent similarity measures. The results showed that 
the performance obtained with the generic measures is just 
slightly inferior to the performance obtained with the do- 
main dependent measures, regarding the quality of the 
evolved solutions. In each task, the highest scoring generic 
measures were not significantly worse than the domain de- 
pendent measure. Furthermore, the results show that the 
advantages of novelty search identified in previous work [7] 
hold with the generic measures: novelty search excelled at 
bootstrapping the evolutionary process, and was successful 
in circumventing deception. 

In the comparison between the proposed generic similarity 



measures, we found that the sampled average state achieved 
the best results in both tasks. However, from a general per- 
spective, this measure is associated with a number of limita- 
tions: (i) the characterisations can become too long if there 
is a high number of sensors/effectors and a high value of W 
is necessary; (ii) it is not applicable to tasks where simu- 
lations can have different lengths; and (iii) in tasks where 
the robots of the swarm are performing different sub-tasks 
at the same time, averaging the sensor-effector states of all 
robots can result in a meaningless characterisation. On the 
other hand, the combined state count measure does not suf- 
fer from these limitations, despite the inferior performance 
verified in the two tasks presented in this paper. As such, 
we contend that the state count approach should not be dis- 
carded, and it should be further improved in future work. 
More experiments, with different tasks, are also needed in 
order to determine how well our results generalise, and clar- 
ify which measures are more suitable for each type of task. 
The use of novelty search with generic behaviour similar- 
ity measures, in combination with traditional fitness-based 
evolution, opens interesting possibilities in the domain of 
evolutionary swarm robotics. First, it facilitates the use of 
straightforward fitness functions. There is no need to shape 
the fitness function in order to avoid local maxima, since 
novelty search circumvents that issue, without depending 
on additional information provided by the experimenter. It 
is a step towards evolving complex solutions with minimal 
intervention from the experimenter. Second, generic mea- 
sures can potentially be used to unveil a true diversity of 
solutions based on self-organisation, with the evolved diver- 
sity not being conditioned by the experimenter. 
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